Spoken Document Confidence Estimation Using Contextual Coherence

نویسندگان

  • Taichi Asami
  • Narichika Nomoto
  • Satoshi Kobashikawa
  • Yoshikazu Yamaguchi
  • Hirokazu Masataki
  • Satoshi Takahashi
چکیده

Selecting well-recognized transcripts is critical if information retrieval systems are to extract business intelligence from massive spoken document databases. To achieve this goal, we target spoken document confidence measures that represent the recognition rates of each document. We focus on the incoherent word occurrences over several utterances in ill-recognized transcripts of spoken documents. The proposed method uses contextual coherence as a measure of spoken document confidence. The contextual coherence is formulated as the mean of pointwise mutual information (PMI). We also propose a smoothing method of PMI, which deals with the data sparseness problem. Compared to the conventional method, our smoothing technique offers improved correlation coefficients between spoken document confidence scores and recognition rates from 0.573 to 0.672. Moreover, an even higher correlation coefficient, 0.710, is achieved by combining the contextual-based and decoder-based confidence measures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Advances in speechfind: transcript reliability estimation employing confidence measure based on discriminative sub-word model for SDR

This study presents our recent advances in our spoken document retrieval (SDR) system SpeechFind including our partnership with the Collaborative Digitization Program (CDP). A proto-type of SpeechFind for the CDP is currently serving as the search engine for 1,300 hours of the CDP audio content. These audio corpus of spoken document possess a wide range of conditions which make speech recogniti...

متن کامل

CRF-based combination of contextual features to improve a posteriori word-level confidence measures

The paper addresses the issue of confidence measure reliability provided by automatic speech recognition systems for use in various spoken language processing applications. In this context, a conditional random field (CRF)-based combination of contextual features is proposed to improve wordlevel confidence measures. More precisely, the method consists in combining phonetic, lexical, linguistic ...

متن کامل

Probabilistic concept verification for language understanding in spoken dialogue systems

In the past researches, several kinds of information have been explored to assess the confidence measure or to select the confidence tag for a word/phrase. However, the contextual confidence information is little touched. In this paper, we propose a concept-based probabilistic verification model to integrate the contextual confidence information. In this model, a concept is verified not only ac...

متن کامل

Syllable-Based Chinese Text/Spoken Document Retrieval Using Text/Speech Queries

In order to solve the problem with the fast growth of Chinese information resources on the Internet, this paper deals with the problem of Chinese text and spoken document retrieval using both text and speech queries. By properly utilizing the monosyllabic structure of Chinese language, the proposed approach performs the statistical similarity estimation between the text/speech queries and the t...

متن کامل

Augmented set of features for confidence estimation in spoken term detection

Discriminative confidence estimation along with confidence normalisation have been shown to construct robust decision maker modules in spoken term detection (STD) systems. Discriminative confidence estimation, making use of termdependent features, has been shown to improve the widely used lattice-based confidence estimation in STD. In this work, we augment the set of these term-dependent featur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011